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CROSS-REFERENCE TO RELATED APPLICATIONS 

This application claims priority to the provisional patent application Serial Number: 
60/201,867, filed on May 4, 2000. 

BACKGROUND OF THE INVENTION 

Under various situations, it is necessary to measure the difference between two 
Probability Distribution Functions (PDF's). For example, in text independent speaker 
recognition using Gaussian mixture models (GMM), the classification of a given piece of 
speech can be done by comparing its GMM model with a set of given GMM models. D. A. 
Reynolds and R. C. Rose, "Robust Test-Independent Speaker Identification Using Gaussian 
Mixture Speaker Models," IEEE Trans, on Speech and Audio Processing, Vo. 3, No. 1, pp. 
72-83 (1995). Another scenario is to detect the difference among observation probabilities, 
again often characterized by GMM, of each state of a continuous Hidden Markov Model 
(HMM) so that similar states can be merged to simplify the overall model in speech 
recognition tasks. Q. Huang, Z. Liu, A. Rosenberg, D. Gibbon, B. Shahraray, "Automated 
Generation of News Content Hierarchy by Integrating Audio, Video, and Text Information," 
Proc. oflEEEICASSP 99, Vol IV, pp.3025-28 (Phoenix, March, 1999). Although much 
needed, there is so far no simple way to measure the distance between two mixture PDF's. 

There are three well-known properties of a distance measure, namely non- 
negativeness, symmetry, and triangular inequality. Let G(x), F(x), and H(x) be three PDF's, 
Denote D(G,F) as the distance between G(x) and F(x), then the three properties can be 
formally expressed as; 

D(G,F)j> 0, andD(G,F) = 0iff.G = F (1) 
D(G,F) = D(F,G) (2) 
D(G,H) + D(H,F) * D(G,F) (3) 

There are different approaches to measure the difference between two PDF's. We 
summarize them into three categories. They may or may not satisfy the three distance 
properties. 

The first approach defines the distance in L r space by 
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D Lr {G,F)={\ x ^ x \G(x)-F{x)\ r ^ ' 

(4) 

where commonly used values of r may be 1 or 2. Although satisfying all three distance 
properties, D L r is usually computed by numerical methods. Therefore, the computational 
complexity can easily go out of control with the increasing dimension. 

The second approach is the relative entropy or Kullback Leibler distance (KLD). T. 
M. Cover and J. A. Thomas, Elements of Information Theory (John Wiley & Sons, 1991). 
It is defined as 



15 



(5) 



It is obvious that the straightforward KLD defined above satisfies only the first 
property. By extending the original KLD to D KL (G,F) 4- D KL (F,G), one can force it to meet 
the symmetry property. Although the third property does not hold, the extended KLD is 
20 popular in many applications due to the lack of better alternatives. To compute KLD, 

different approximation schemes are often employed. For example, data sequences T G and 
T F can be generated from models G and F and then the average log-likelihood ratio of the 
sequences with respect to G(x) and F(x) can be used to approximate the extended KLD. 
That is, 
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D Seq {G,F)^ 



1 P(T G \G) 
log 



p{T G \F) 



. P(T T \F) 
° g p(T T \G) 



(6) 



where N is the length of the data sequences T G and T F . The performance of D Seq is a 
30 function of both the value of N as well as the data generation procedure. The bigger N is, 
the more reliable the approximation is. At the same time, it makes the estimation more 
expensive. 

The third approach is to compute the distance directly from the respective 
parameters. Ideally, such a method can achieve at least comparable performance with a 
precise closed form solution that subsequently leads to a much more efficient computational 
procedure. Unfortunately, the existing method in this category is capable of handling only 
simplified cases (or degenerated cases). For example, if G(m l5 o x ) and F(m 2 , a 2 ) are single 
Gaussians from two individual PDFs, where m 1? m 2 , <j l9 and a 2 are their corresponding 
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means and standard deviations, the extended KLD between G and F in this simplified single 
mixture case can be computed directly from the model parameters to be 
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(m l -m 2 ) 2 , 



(7) 
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ignoring the constant multiple. 

Even though the computation of D P is simple and can be extended to handle 
Gaussians of higher dimensions, it cannot deal with multiple mixture PDFs. Even with the 
possibility of simplifying the models (using one Gaussian to approximate multiple 

15 Gaussians) so that (7) can be applied, the outcome often indicates that it is not effective. 
This can be illustrated using a simple example. Consider two GMM's G = l/3*N(-2, 1) + 
2/3*(l, 1) and F = 1/3*N(2, 1) + 2/3*N(-l, 1), where N(m 5 a) is a Gaussian distribution 
with mean m and standard deviation a. Both G and F have two Gaussian components that 
are obviously distributed quite differently. Hence, the distance between G and F is clearly 

20 not zero. To apply (7), both G and F have to be simplified into one mixture Gaussian, 
denoted by G^n^, o G ) and F'(m F , o F ), where the new mean and standard deviation can be 
derived as the weighted average of the means and standard deviations from their 
components. This yields the same mean (m G =m F =0) and same standard deviation (a G =a F ) 
for both G' and F' which leads to D P (G\ F') - 0. Evidently, the measure derived this way 

25 failed to capture the obvious difference between the two original PDF's. 

Therefore, there is a need to develop other alternatives that can effectively measure 
the difference between mixture PDF's directly from their model parameters. 

SUMMARY OF THE INVENTION 

30 In accordance with our invention, for two mixture-type probability distribution 

functions (PDF's), G, H, 
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where G is a mixture of N component PDF's & (x), H is a mixture of K component PDF's h k 
(x), jx x and y k are corresponding weights that satisfy 
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Zft sl and Z fk = !; 
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we define their distance, D M (G, H), as 



N K 

10 w =t^l w w 



where d(gj, hk) is the element distance between component PDF's g; and h k and w satisfies 
<o ik > 0, 1 < z < AT, 1 < £ < 
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and 



a: at 
£ 0 tt = //„ 1< i < N, Y 0 ik = y k A<k<K. 

20 i r» 1 i J J 

The definition of distance can be usefully applied to various sets of real world data as 
demonstrated below. 



BRIEF DESCRIPTION OF THE DRAWINGS 

These and other objects, features and advantages of our invention can be better 
understood from the following detailed description of the invention in which: 

Fig. 1 is an illustration of the relationship between two probability distribution 
functions useful in understanding equations 8-12; 

Fig. 2 is four behavior plots comparing our metric with previously defined metrics; 

Fie. 3 is a plot of retrieval performance; and 

30 

Fig. 4 is a block diagram of a computer system useful m practicing our invention. 

DETAILED DESCRIPTION OF THE INVENTION 

This detailed description is organized as follows. In section I we present our metric. 

In section II we demonstrate that the new metric satisfies the three distance properties 
35 ... 
under certain constraints. Comparison between the proposed and other existing measures is 

given in section III. Some of the preliminary results from applying the new metric to audio 

based content retrieval applications is shown in section IV. 



5 1 . Parametric Distance Metric for Mixture PDF 

Suppose G(x) and H(x) are two PDF's of mixture type, 

G(x) = £ /,,&(*)» H(x) = £ Yk K(x\ 

1=1 jt-i 

10 (8) 

where G(x) is a mixture of N component PDF's g^x), H(x) is a mixture of K component 
PDF's h k (x) and \i { and y k are corresponding weights that satisfy 



15 S A = 1 and £ r k = l 

/=i jt=i 

(9) 



Fig. 1(a) and 1(b) illustrate the structure of G(x) and H(x). A one-dimensional 
example is shown for each PDF at the bottom of the figures. Our invention is inspired by 
two observations. First, the distance between the pair of element PDF's (&(x) and h k (x)) is 
easy to compute. For example, the distance between two single Gaussians can be 
determined directly from their parameters. Fig. 1(c) shows one simple distance 
measurement between two element PDF's. Second, the distance between two mixture-type 
PDF's is essentially determined by their components. Although the element distances are 
not obviously related to the overall distance, our invention is to define a framework so that a 
meaningful overall distance can be computed directly from all element distances. For 
simplicity, we will drop the x term in the rest of the formulae in this patent. Denote the 
distance between a pair of components, say & and h k5 by d(g i? hjj. 

In accordance with our invention the overall distance between G and H is defined as 
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JV K 

D M {G,H)= min ZE^(&A) 

(10) 



m ik > 0, \<i<N 9 \<k<K 

(11) 
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K N 

j co ik = //,-, 1 < i < N, £ = a, 1 ST* < 
jt=l i'=l 



(12) 



This framework is a fully connected network in which any component g; in mixture 
G can interact with any component h k in H via weighted element distance (0 11c d(g i5 h*). The 
10 degree of interaction is inversely proportional to the element distance and proportional to 
the mixture weights U; and y k . The weights co^ are ultimately determined through 
optimizing with respect to the given constraints in (1 1, 12). The framework is visualized in 
Fig. 1(d). 

The solution is posed as a linear programming problem. There are many algorithms 
1 5 available to solve it efficiently, such as the simplex tableau method. In our formulation, 
there are a total of NxK free parameters (o^'s) and N+K equality constraints, where only 
N+K-l of them are independent. According to the optimization theory, at most N+K-l of 
the NxK parameters will not vanish. The problem has a solution because (1) we can easily 
find a feasible vector that satisfies all the constraints: co ik = U; x Y k and (2) the upper bound 
20 for the objective function exists: max ik d(gi, hj. 

The proposed framework for the distance metric is general. Since the overall 
distance is constructed from element distance measures, its generality comes from the fact 
that the element distance measure is left unspecified. Depending on different application 
needs, appropriate element distance measures, which may even be non-parametric, can be 
25 plugged in and the overall distance between two mixture PDF's can be computed using the 
same framework. Furthermore, there is no requirement about the specific type of element 
distribution or that each PDF should be the same type. 

TI. Proof of D istance Pro perties 

30 If the element distance d(g i5 h^ between two mixture components g, and h k satisfies 

the three distance metric properties (1-3), the overall mixture distance D M (G, H) does as 
well. 

The proof of the first two properties is straightforward. To prove the triangular 
inequality property, we need to show for any three mixture PDF's G, H, and F that: 

35 

D M (G,H) + D M (H,F)> D M {G,F). 

(13) 
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The definitions of G and H are the same as (8). F is similarly defined as 
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1 0 where F is a mixture of M component PDF's f- and ^ is a weight that satisfies 



M 

(14) 

15 Applying definition (10) to both (G, H) and (H, F), we have 
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(15) 

where co ik and v kj satisfy 



N M K K 

25 S ®ik = Z v kj = r*>X ®ik = ^ X v kj = £/• 

(16) 

Then 
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D M (G,H)+D M (H,F) 
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j=\ Jt=l 



* Ji w ik v 

_M j=\ 7 k 
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,=1 >=1 V k=\ Yk ' 



Let 



Jfc=l 



20 "» - n 



then (17) can be rewritten as, 

N M 

D M (G, H)+ D M (H,F) = YjYj «ij d (Si »/y ) 
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(17) 



(18) 



(19) 



For any set o B that satisfies the equation constraints in (12), the following inequality is also 
true as D m (G, F) is the outcome of optimization, 
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D M {G,F)<YSL«ijd(.Si>fi) 

M>1 



In fact, variables indeed satisfy the required constraints: 

35 Z«.-ZZ^-ZZ^-Zv-«, 
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j=l 7=1 k=l 7 k *=1 >1 'k *=1 



(20) 



(21) 



Putting (18) and (19) together, we can obtain (12). 



III. Benchmark Against Existing Measures 

While we have proved that the new metric possesses certain properties, we also like 
to demonstrate that it has similar behavior as other existing measures. In this application, 
we compare it with the two previously defined measures: D L 2 (equation (4)) and D Seq 
10 (equation (6)). 

Without the loss of generality, we perform the comparison on two dimensional 
GMM's F and G, each with two mixtures. The element distance used is the extended KLD 
defined in (7). Specifically, F is defined as 



15 / 
^=0.5^ 



+ 0.5N\ 



-1 
0 



(22) 

where N(u, o) is a 2-D Gaussian with mean vector u and diagonal covariance a. The 
20 comparison is conducted in four settings, in each of which, by perturbing the model 

parameters in G we observe how the three different measures (D L 2 , D &? , and D M ) react to the 
changes. 

In setting one, G has exactly the same component Gaussians as F with variable 
mixture weights, controlled through u, 
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G=pN 



-1 
0 



(23) 



where |x varies between 0 and 0.5. 
30 In setting two, the two component Gaussians of G have the same weights and 

covariances as those of F but with variable mean vectors, changed along a circle of radius 
one, controlled through a 
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G= 05N 



where a is in the range of 0 to n. 



(24) 
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Setting three is similar to setting two except that we vary the mean vectors of G 
5 symmetrically in the first dimension. 
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(25) 



10 where m is from 0.5 to 1.5. 

In setting four, G has the same weights and mean vectors for both components but 
with the covariance vector [5, 5] changing along both dimensions simultaneously 
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where 8 ranges from 0.5 to 1.5. 

Figure 2 shows the behavior plots of the three measures under four different settings. 

20 All the curves are normalized so that the maximum distance is 1. From these plots, one can 
see that the overall behavior of all three are consistent in every tested setting. D M curve 
overlaps with D L 2 in setting one and part of setting three. In setting four, D M falls between 
D L 2 and D Seq . These plots demonstrate that the proposed new metric behaves similarly in 
different scenarios as the existing measures, which have been widely used in practice. But 

25 the proposed metric is obviously much more efficient in terms of computation. In addition, 
with this metric, there is no need to store or to generate data points in order to compare the 
difference between two PDF's. This is significant, particularly in content based search and 
retrieval where large amounts of data are pre-indexed, stored, preferably, in a succinct 
parametric form, and searched in real-time. For example, to retrieve the speech segments of 

30 a particular speaker from a large database based on a given query example, all the pre-stored 
speaker segments in the database have to be matched against the given query sample. In 
this case, having a measure that can compare the similarity directly from the speaker model 
parameters will be much more efficient than the ones that require the data points to be 
generated from the models first and then compared. This is especially true when the search 

35 space is large, a realistic scenario in almost all information retrieval tasks. 
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IV. Enabling More Efficient Audio Based Content Retrieval 

5 To further demonstrate the usefulness of the proposed measure in practical 

applications, we apply it to the real problem of audio based query-by-example. Given a 
database of audio events, the task is to search and retrieve some given type of audio event 
specified by a query example. Each audio event in the database is stored as a set of 
parameters of a mixture GMM with diagonal covariance matrix. By choosing appropriate 

1 0 element distance measures, we demonstrate that the query/retrieval task using our proposed 
metric yields comparable performance with more efficient computation. 

In our experiment, a database containing 278 audio events is constructed from 7 
hours of NBC Nightly News programs. Each event is an acoustically homogeneous 
segment such as a segment of speech from a particular speaker or a piece of music. A set of 

15 acoustic features (Root Mean Square energy and 12 Mel-Frequency Cepstral Coefficients) 
is extracted from the audio signal and fitted by a four mixture GMM model, whose 
parameters are stored in the database. Q. Huang, Z. Liu, A. Rosenberg, D. Gibbons, B. 
Shahraray, Automated Generation of News Content Hierarchy by Integrating Audio, Video, 
and Text Information, Proc. oflEEEICASSP 99, Volume IV, pp. 3025-28. During query, 

20 an audio segment is provided by users as the query example and the retrieval process is to 
find all the audio segments in the database that have the similar acoustic properties as the 
query example. For example, if the query sample is a piece of speech from former 
President Clinton, the task is to find all Clinton speech segments in the database. 

Using the given query example, a four mixture GMM is built and compared with all 

25 other GMM's stored in the database. Two categories of measures are used to perform the 
comparison of GMM models. One is the distance measure by sequence D Seq and the other 
category is the distance measure, D M , described herein. Since our distance measure uses an 
element distance measure as a building block, we choose, in this experiment, two types of 
element distance measures to show that the proposed framework has the flexibility to adapt 

30 to different application needs. One element distance measure is L, norm and the other is L 2 
norm. Both satisfy all three distance properties. Formally the distance between f and g can 
be written as 



N r N 



(27) 
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where N is a feature dimension, u/, m 8 , of, and of are the i-th mean and standard deviations 
5 off and g. 

Even though the mean and standard deviation may have very different dynamic 
ranges, the choice is reasonable for this application because when one range is much larger 
than the other, the impact from the smaller one is negligible in the overall distance value. 
Plugging in the two chosen element distance measures, L, norm and norm, we obtain two 
10 measures, denoted by D ME1 and D ME2 . Using each of the three D &9 , D ME i and D ME2 , we 
compute the distance between the given query example and each of the audio events in the 
database. When the distance is smaller than a threshold (which can be set by user), the 
corresponding audio event is considered as a hit. 

To evaluate the retrieval performance, we use Recall Rate (RR) and False detection 
1 5 Rate (FR) which are defined as follows. Assume that there is a total of T recorded events in 
the database. Given a query example, there are Q events in the database that are true 
matches. If the retrieval process returns R events as query results, among which C events 
are the correct match, then RR is defined as C/Q, and FR is defined as (R-C)/(T-Q). Similar 
to the Receiver Operating Characteristic (ROC) in classical detection theory, H. L. Van 
20 Trees, Detection, Estimation, and Modulation Theory (John Wiley & Sons, 1967), we can 
plot a 2-D graph (similar to the PF-PD graph in detection theory) as in Figure 3 to visualize 
the retrieval performance. 

The query is for a particular speaker, the anchor of NBC Nightly News, Tom 
Brokaw. In the database, there are 55 segments that are Tom Brokaw's speeches. We use 
25 each of them as a query example and compute the corresponding FR-RR graph. Figure 3 
shoes the average FR-RR graph of all the query performance. As it can be seen from the 
figure, D ME1 and D DM2 display similar performance as D Seq . When FR < 0. 1 1 , D ME2 is 
slightly worse than D &9 and D ME2 is slightly better than D Seq when FR > 0. 1 1 . While 
computing D Se ,, we choose the length of the testing sequence as 5000. For each query, the 
30 computation cost of D Seq is 25 times that of and D ME2 . Taking into account the 
significant reduction in computation, the proposed new metric outperforms the existing 
ones. 

Preferably, the present invention is implemented using a computer system 402 as 
shown in Fig. 4. This computer includes central processing unit ("CPU") 403, memory unit 
35 404, one or more storage devices 406, one or more input devices 408, display device 410, 
and communication interface 412. A system bus 414 is provided for communicating 
between the above elements. Another output device, such as printer 416, may also be 
included as part of system 402. 

12 



This computer illustratively is an IBM compatible personal computer, but one 
5 skilled in the art will understand that the system is not limited to a particular size, class or 
model of computer. CPU 403 illustratively is one or more microprocessors such as the 
Pentium™ class of microprocessors available from Intel. Memory unit 404 typically 
includes both some random access memory (RAM) and some read only memory (ROM). 
Input devices 408, which illustratively include a keyboard, a mouse, and/or other 
10 similar device, receive data. The inputted data is stored in storage device 406. Storage 
devices 406 illustratively include one or more removable or fixed disk drives, compact 
discs, DVDs, or tapes. Output device 410 illustratively is a computer display, such as a 
CRT monitor, LED display or LCD display. Communication interface 412 may be a 
modem, a network interface, or other connection to external electronic devices, such as a 
15 serial or parallel port. For some applications of the invention it is anticipated that this 
interface will include a connection to the Internet. 

PDF data is entered into computer system 402 via input device 408 and/or 
communication device 412 and stored in storage device 406. Processor 403 calculates the 
distance between PDF's in accordance with equations 8-12 following a suitable computer 
20 program stored in memory unit 404 and/or storage device 406 that implements the solution 
of these equations. Display 410 depicts the results. 

As will be apparent to those skilled in the art numerous modifications may be made 
in the practice of our invention. 
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